Energy efficiency optimization method for irs-assisted noma thz network

ABSTRACT

An energy efficiency optimization method for an IRS-assisted NOMA THz network comprises: classifying users into BS users and IRS users; defining a channel model for the BS users and a channel model for the IRS users; calculating a BS user rate and an IRS user rate respectively, and calculating a total rate of a system; proposing an optimization problem for downlink power control and IRS phase shift adjustment; and solving the optimization problem through an MADRL method. The invention puts forward an energy efficiency concept and adopts an MADRL method to maximize the overall energy efficiency of the system under the constraints of minimum rate and maximum power of each user.

BACKGROUND OF THE INVENTION

The invention relates to a network energy efficiency optimization method, in particular to an energy efficiency optimization method for an IRS-assisted NOMA THz network, and belongs to the technical field of communication.

The demand for an ultra-high data rate of information and entertainment grows rapidly in current and future wireless communication. However, the available spectrum resources are far from supporting the increasing data rate, which makes it urgent to explore new broadband to break through the spectrum bottleneck. Therefore, Terahertz (THz) band attracted wide attentions of the academic and industrial communities with its broadband characteristics, and considered as the basic technology of the sixth-generation (6G) mobile communications. THz wave refers to the frequency of 0.1-10 THz and its available bandwidth is more than tens time of the millimeter wave. Its peak data rate is expected to 1-10 TBits/s. Owing to the advantages of narrow beam and large communication capacity, THz band provides more potentials to achieve ultra-high wireless transmission rate. However, due to the high frequency and small wavelength, the diffraction and penetration ability of THz wave is worse than microwave and millimeter wave, which makes it easier to be blocked by obstacles.

Due to the intense attenuation performance, THz band is only suitable for short-distance communication scenarios, such as shopping malls, subway stations and other indoor places. The THz applications in outdoor communications require a lot of relay equipment. Therefore, some scholars propose to combine THz technology with intelligent reflecting surface (IRS) to make the transmission more efficient. IRS is a kind of reflecting surface composed of a large number of passive reflecting components. Each component can adjust its angle to reflect the signal independently. The intelligent reflector can be placed on the surface of the buildings, which effectively reflects the indoor and outdoor signals. Many studies have focused on the IRS-assisted communication by THz band.

THz wave has wide bandwidth with more latent users and equipments such as mobile users, industrial users and intelligent health-care terminals. However, the THz band has a major defect of small coverage, which is caused by severe attenuation of THz signals. This defect will lead to a heavy transmission burden and result in a rapid increase of energy consumption. Non-orthogonal multiple access (NOMA) is a promising wireless communication technique, which allows users to share the same sub-channel simultaneously and their communication resources through a power domain or a code domain. Compared with traditional orthogonal multiple access, NOMA is an effective technique for improving spectral efficiency and realizing mass wireless network connection [8]. NOMA encourages more user devices to share the same sub-channel and can provide many data services to increase the utilization rate of resources in a THz network. In order to realize mass wireless connection and increase the resource utilization rate in THz communication, NOMA is applied to the THz network in recent studies. By introducing NOMA to a THz cellular network, a sub-channel and power allocation approach based on an alternating direction method is put forward to optimize energy efficiency.

Inspired by the capacity enhancement of NOMA and the talent coverage improvement of IRS, the combination of NOMA with IRS-assisted communications has aroused significant interests. For example, in some researches, a design of IRS-assisted NOMA downlink transmission was proposed, wherein channel vectors of marginal users are aligned in a preset spatial direction with the aid of IRSs. Some researches put forward an IRS-aided NOMA network, and proposed an energy-efficient scheme to jointly optimize the transmission beamforming of the BS and the reflection phase shift of the IRS. In addition, some researches think out IRS-enhanced millimeter-wave NOMA systems and come up with joint optimization of beam formation and power allocation.

The resource management mechanism of traditional networks has been relatively mature, but when applied to THz networks, it still has many limitations, which mainly include:

-   -   Limitation in the number of users and devices accessed to         networks: the existing resource management mechanism is only         suitable for cases where the number of users accessed to the         networks is small; with the increase of users and devices         accessed to the networks, the utilization rate of the spectrum         will be decreased, so the energy efficiency of the networks         needs to be studied in cases of too many users and devices;     -   Severe signal attenuation: due to severe attenuation of the THz         frequency, signals are quite likely to be blocked by a building,         users in the shade of the building will be unable to receive         signals from a BS, which leads to a failure of normal         communication; and a large number of BSs have to be constructed         for traditional THz networks to guarantee a minimum         signal-to-noise ratio of users;     -   Low energy efficiency: with the increase of users and devices         accessed to networks, too many BSs with high transmitting power         have to be constructed and will consume too much energy, which         makes the energy utilization rate of existing THz communication         systems excessively low;     -   Low algorithm efficiency: traditional networks use a DQN         algorithm and a single agent for reinforcement learning, each         agent represents one user, training is performed on the user         side without considering information exchange between users, and         a large flow will be occupied every time a BS is trained to         transmit information.

BRIEF SUMMARY OF THE INVENTION

The technical issue to be settled by the invention is to provide an energy efficiency optimization method for an IRS-assisted NOMA THz network to overcome the defects of the prior art.

The technical solution adopted by the invention to settle the above technical issue is as follows:

An energy efficiency optimization method for an IRS-assisted NOMA THz network comprises the following steps:

-   -   Step 1: classifying users into BS users and IRS users;     -   Step 2: defining a channel model for the BS users and a channel         model for the IRS users;     -   Step 3: calculating a BS user rate and an IRS user rate         respectively, and calculating a total rate of a system;     -   Step 4: proposing an optimization problem for downlink power         control and IRS phase shift adjustment; and     -   Step 5: solving the optimization problem through an MADRL         method.

Further, in Step 1,

N_(B) antennas are configured for a base station, N_(U) antennas are configured for users, and the users are classified into BS users and IRS users; assume the number of the BS users is L, the BS users is represented by a set

={1, 2, . . . , L}; the IRS users are divided into M clusters, wherein each cluster comprises K users and is served by G IRS elements, and

={1, 2, . . . M}

={1, 2, . . . G},

={1, 2, . . . , K)} a bandwidth of the system is divided into multiple sub-channels, wherein each BS user and each IRS user respectively use one sub-channel, and assume the BS users use the first L sub-channels, IRS users use the remaining sub-channels.

Further, in Step 2, the channel model for the BS users is specifically as follows:

Considering that a THz channel from a BS to users is modeled into a LoS path with the neglect of the reflected, scattered and diffracted fading due to severe attenuation of THz; a channel gain from the BS to a user l at a sub-channel n is expressed as:

$h_{l,n}^{B} = \sqrt{\frac{1}{{PL}\left( {f_{n},d_{l}} \right)}}$

Wherein, PL(f_(n), d_(l)) is a path loss of the THz LoS path, and F_(n) and d_(l) are a THz frequency and a distance between the BS and the user; the path loss of the THz LoS path is formed by two parts, of which one is a free space spreading loss and the other is a molecular absorption loss, with an expression as:

PL(f _(n) ,d _(l))=L _(spread)(f _(n) ,d _(l))×L _(abs)(f _(n) ,d _(l))

Where, L_(spread)(f_(n), d_(l)) and L_(abs)(f_(n), d_(l)) meet:

${{L_{spread}\left( {f_{n},d_{l}} \right)} = \left( \frac{4\pi f_{n}d_{l}}{c} \right)^{2}}{{L_{abs}\left( {f_{n},d_{l}} \right)} =^{{- {k_{abs}(f_{n})}}d_{l}}}$

Where, c represents a speed of light, and k_(abs)(f_(n)) represents molecular absorption coefficient;

Assume power transmitted to the user l through the sub-channel n is P_(l,n) ^(B), a received signal is:

$y_{l,n}^{B} = {{h_{l,n}^{B}p_{l,n}^{B}} + {h_{l,n}^{B}{\sum\limits_{{l^{\prime} = 1},{l^{\prime} \neq l}}^{L}{\sum\limits_{{n^{\prime} = {n - 1}},{n^{\prime} \neq n}}^{n + 1}p_{l^{\prime},n^{\prime}}^{B}}}} + \sigma^{2}}$

Where, σ² is additive white Gaussian noise power, and p_(l′,n′) ^(B) is power transmitted to a user l′ through the sub-channel n.

Further, in Step 2, the channel model for the IRS users is specifically as follows:

A channel for the IRS users is composed of a channel from the BS to an IRS, a channel from the IRS to the users, and a phase shift of IRS elements; according to a classical S-V model, assume a channel vector reflected by an IRS i to a k^(th) user in an m^(th) cluster is defined as:

H=H ^(I) ΦH ^(B)

Wherein, H^(B) represents channel attenuation from the BS to the IRS, H^(I) represents channel attenuation from the IRS to the users; Φ is a G×G diagonal matrix, represents the phase shift of the IRS elements and meets Φ=diag([e^(jφ) ¹ , . . . , e^(jφ) ^(G) ]), wherein φ_(g) represents the phase shift of a g^(th) element; H^(B) is expressed as:

H ^(B) =A ^(I) ¹ diag(α)A ^(B)*

Wherein

α=√{square root over (N _(B) G/L ₁)}[α₁, . . . ,α_(l) ₁ , . . . ,α_(L) ₁ ]*

A ^(B)=[α^(B)(ϕ₁), . . . ,α^(B)(ϕ_(l) ₁ ), . . . ,α^(B)(ϕ_(L) ₁ )]

A ^(l) ¹ =[α^(l) ¹ (γ₁ ^(A)), . . . ,α^(l) ¹ (γ_(l) ₁ ^(A)), . . . ,α^(l) ¹ (γ_(L) ₁ ^(A))]

Wherein, L₁ represents the number of scattering paths from the BS to the IRS, α_(l) ₁ is a complex gain from the path loss of a path l₁, ϕ_(l) ₁ ∈[0, 2π] and γ_(l) ₁ ^(A)∈[0, 2π] represent a departure angle and an arrival angle on the path l₁ from the BS to the IRS; here, uniform linear arrays are considered, and α^(B)(ϕ_(l) ₁ ) and α^(l) ¹ (γ_(l) ₁ ^(A)) represent array response vectors at the BS and the IRS and are expressed as:

${{\alpha^{B}\left( \phi_{l_{1}} \right)} = {\frac{1}{\sqrt{N_{B}}}\left\lbrack {1,e^{{j({2\pi/\lambda})}d{\sin(\phi_{l_{1}})}},\ldots,e^{{j({N_{B} - 1})}{({2\pi/\lambda})}d{\sin(\phi_{l_{1}})}}} \right\rbrack}^{*}}{{\alpha^{l_{1}}\left( \gamma_{l_{1}}^{A} \right)} = {\frac{1}{\sqrt{G}}\left\lbrack {1,e^{{j({2\pi/\lambda})}d{\sin(\gamma_{l_{1}}^{A})}},\ldots,e^{{j({G - 1})}{({2\pi/\lambda})}d{\sin(\gamma_{l_{1}}^{A})}}} \right\rbrack}^{*}}$

Where, λ is a wavelength of THz signals, and d is a distance between adjacent antenna elements or IRS elements;

Similar to a BS-IRS link, a IRS-user channel is formulated as:

H ^(I) =A ^(U)diag(β)A ^(I) ² *

Wherein,

β=N _(U) G/L ₂[β₁, . . . ,β_(l) ₁ , . . . ,β_(L) ₁ ]*

A ^(U)=[α^(U)(ψ₁), . . . ,α^(U)(ψ_(l) ₂ ), . . . ,α^(U)(ψ_(L) ₂ )]

A ^(I) ² =[α^(I) ² (γ₁ ^(D)), . . . ,α^(I) ² (γ_(l) ₂ ^(D)), . . . ,α^(I) ² (γ_(L) ₂ ^(D))]

Wherein, L₂ represent the number of scattering paths from the BS to the IRS, ψ_(l) ₂ is a complex gain from the path loss of a path l₂, and ψ_(l) ₂ ∈[0, 2π] and γ_(l) ₂ ^(D)∈[0, 2π] represent a departure angle and an arrival angle on the path l₂ from the IRS to the BS; here, uniform linear arrays are considered, and α^(U)(ψ_(l) ₂ ) and α^(I) ² (γ_(l) ₂ ^(D)) are expressed as:

${{\alpha^{U}\left( \phi_{l_{2}} \right)} = {\frac{1}{\sqrt{N_{U}}}\left\lbrack {1,e^{{j({2\pi/\lambda})}d{\sin(\phi_{l_{2}})}},\ldots,e^{{j({N_{U} - 1})}{({2\pi/\lambda})}d{\sin(\phi_{l_{2}})}}} \right\rbrack}^{*}}{{\alpha^{l_{1}}\left( \gamma_{l_{1}}^{D} \right)} = {\frac{1}{\sqrt{G}}\left\lbrack {1,e^{{j({2\pi/\lambda})}d{\sin(\gamma_{l_{2}}^{D})}},\ldots,e^{{j({G - 1})}{({2\pi/\lambda})}d{\sin(\gamma_{l_{2}}^{D})}}} \right\rbrack}^{*}}$

So, IRS-user channel is:

H=A ^(U)diag(β)A ^(I) ² *ΦA ^(I) ¹ diag(α)A ^(B)*

For the sake of brevity, assume N_(B)=1 and N_(U)=1, the vector H is composed of a vector representing a channel gain h_(i,m,k,n) ^(I) of the k^(th) user in the m^(th) cluster on the sub-channel p_(i,m,k,n) ^(I) represents power transmitted to the k^(th) user in the m^(th) cluster on the sub-channel n; a signal received by the k^(th) user in the m^(th) cluster on the sub-channel n is expressed as:

$y_{i,m,k,n}^{I} = {{h_{i,m,k,n}^{I}p_{i,m,k,n}^{I}} + {h_{I}^{i,m,k,n}{\sum_{{n^{\prime} = {n - 1}},{n^{\prime} \neq n}}^{n^{\prime} = {n + 1}}p_{i,m,k,n}^{I}}} + {h_{i,m,k,n}^{I}{\sum_{k^{\prime} = 1}^{k^{\prime} = {k - 1}}p_{i,m,k,n}^{I}}} + \sigma^{2}}$

Further, in Step 3,

The BS user rate is calculated:

Wherein, a signal to noise ratio for signal reception of a BS user l is:

${SINR}_{l,n}^{B} = \frac{h_{l,n}^{B}p_{l,n}^{B}}{{h_{l,n}^{B}{\sum_{{l^{\prime} = 1},{l^{\prime} \neq l}}^{L}{\sum_{{n^{\prime} = {n - 1}},{n^{\prime} \neq n}}^{n + 1}p_{l^{\prime},n^{\prime}}^{B}}}} + \sigma^{2}}$

By a Shannon equation, a rate of the user l is expressed as:

$R_{l}^{B} = {B{\sum\limits_{n = 1}^{L}{\log_{2}\left( {1 + {SINR}_{l,n}^{B}} \right)}}}$

Where, B is a bandwidth,

The IRS user rate is calculated:

Wherein, a signal to noise ratio of the k^(th) user in the m^(th) cluster is:

${SINR}_{i,m,k,n}^{I} = \frac{h_{i,m,k,n}^{I}p_{i,m,k,n}^{I}}{{h_{i,m,k,n}^{I}{\sum_{{n^{\prime} = {n - 1}},{n^{\prime} \neq n}}^{n + 1}p_{i,m,k,n}^{I}}} + {h_{i,m,k,n}^{I}{\sum_{k^{\prime} = 1}^{k^{\prime} = {k - 1}}p_{i,m,k,n}^{I}}} + \sigma^{2}}$

The rate is expressed as:

R _(i,m,k) ^(I) =Bπ _(n=1) ^(L+IMK) log₂(1+SINR_(i,m,k,n) ^(I))

The total rate of the system is expressed as:

R=Σ _(l=1) ^(L) R _(l) ^(B)+Σ_(i=1) ^(I)Σ_(m=1) ^(M)Σ_(k=1) ^(K) R _(i,m,k) ^(I)

Further, in Step 4,

To maximize overall energy efficiency of a network, the optimization problem for downlink power control and the IRS phase shift adjustment is proposed, wherein total transmission power of the BS is calculated as the sum power of all the users by:

$P = {{\sum\limits_{l = 1}^{L}{\sum\limits_{n = 1}^{L}p_{l,n}^{B}}} + {\sum\limits_{i = 1}^{I}{\sum\limits_{m = 1}^{M}{\sum\limits_{k = 1}^{K}{\sum\limits_{n = {L + 1}}^{L + {IMK}}p_{i,m,k,n}^{I}}}}}}$

The energy efficiency of the system network is defined as a ratio of a sum rate to the total power of the network, and the optimization problem is formulated as:

${{\max\limits_{\varphi_{i,m,g},p_{i,m,k,n}^{I},p_{l,n}^{B}}{EE}} = \frac{R}{P}}{{{C_{1}:0} < p_{i,m,k,n}^{I} < P_{T}},{\forall{i \in}},{\forall{m \in}},{\forall{k \in}},{\forall{n \in}}}{{{C_{2}:0} < p_{l,n}^{B} < P_{T}},{\forall{l \in \mathcal{L}}},{\forall{n \in \mathcal{N}}}}{{{C_{3}:R_{i,m,k}^{I}} \geq R_{\min}},{\forall{i \in}},{\forall{m \in}},{\forall{k \in}}}{{{C_{4}:R_{l}^{B}} \geq R_{\min}},{\forall{l \in \mathcal{L}}}}{{{C_{5}:\varphi_{i,m,g}} \in \left\lbrack {0,{2\pi}} \right\rbrack},{\forall{i \in \mathcal{I}}},{\forall{m \in \mathcal{M}}},{\forall{g \in \mathcal{G}}}}$

Wherein, C₁ and C₂ are power limitations of each user, C₃ and C₄ are minimum rate requirements, and C₅ is an angle range.

Further, in Step 5,

The optimization problem is solved through the MADRL method: virtual agents are introduced into the BS as mappings of the users, and the virtual agents perform training to obtain optimal power and phase shift; a central control unit is configured on the BS to collect user information including channel state information (CSI), phase shift and power; a clock is set to ensure synchronous iteration during agent training, so that overall energy efficiency is calculated after each iteration; and the agents perform training according to the collected user information and real-time iteration results to realize global optimization.

Further, a Markov process taking a discrete time, a finite state space and an action space into account is used for training; basic elements of reinforcement learning are represented by a tuple (

,

,

,

) where

represents a state space,

represents an action space,

represents a reward function, and

represents a state transition probability; and the state space and the action space are set as follows:

-   -   (1) State space: a tuple (φ, p) is defined to represent an angle         of the IRS elements and the power of the BS users and the IRS         users, and the state space is expressed by a formula         ={s|s=(φ, p)}, wherein φ={φ₁, . . . , φ_(j), . . . φ_(G)};     -   (2) Action space: in order to obtain a finite space, the angle         and the power are discretized by;

${\varphi_{j}:\left\{ {0,\left\{ {{{{\varphi_{\min}\left( \frac{\varphi_{\max}}{\varphi_{\min}} \right)}^{\frac{i}{{❘\varphi ❘} - 2}}❘i} = 0},\ldots,{{❘\varphi ❘} - 2}} \right\}} \right\}}{\varphi:\left\{ {0,\left\{ {{{{P_{\min}\left( \frac{P_{\max}}{P_{\min}} \right)}^{\frac{i}{{❘P❘} - 2}}❘i} = 0},\ldots,{{❘P❘} - 2}} \right\}} \right\}}$

Wherein, φ_(min) and φ_(max) are a minimum phase and a maximum phase of the IRS elements, P_(min) and P_(max) are minimum user power and maximum user power, and a discrete quantity of the angle and a discrete quantity of the power are |φ| and |P| respectively; the action space is formed as

={a|a=(φ, p)}.

-   -   (3) Reward space: a difference between the overall energy         efficiency in a current state and the overall energy efficiency         in a previous state is defined as a reward, which is presented         as:     -   =EE_(t+1)−EE_(t), wherein EE_(t+1) and EE_(t) are energy         efficiency in a state s_(t+1) and energy efficiency in a state         s_(t) respectively;

An optimal strategy π is obtained by the agents to realize a maximum cumulative reward, which is obtained by:

$R_{t}\overset{\Delta}{=}{\sum\limits_{t = 0}^{\infty}{\gamma^{t}r_{t + 1}}}$

Where, γ∈(0,1] is a discount factor for future rewards;

During training, the agents select an action according to the optimal strategy π; at the state s_(t), the agents take an action α_(t) according to the optimal strategy π, and at this moment, an action-value function Q_(π)(s_(t), a_(t)) at of the agents is expressed as:

Qπ(s _(t) ,a _(t))

E _(π) [R _(t) |s=s _(t) ,a=a _(t)]

According to a Bellman equation,

${Q^{*}\left( {s,a} \right)}\overset{\Delta}{=}{E\left\lbrack {{{{r_{t} + {\gamma\max\limits_{a^{\prime}}{Q^{*}\left( {s^{\prime},a^{\prime}} \right)}}}❘s} = s_{t}},{a = a_{t}}} \right\rbrack}$

An evaluation of the optimal strategy is expressed as:

${Q^{*}\left( {s,a} \right)}\overset{\Delta}{=}{Q^{\pi}\left( {s,a} \right)}$

The optimal strategy is obtained by:

π^(*) = arg Q^(*)(s, a)

To search an optimal strategy in a large state space and a large action space, a DQN is introduced into MADRL; the optimal strategy and the value function are approximated as function according to Q_(i)(s, α; θ)≈Q*(s, α), where θ is a weight and is updated by training; the DQN comprises a target network and a current network, which are trained by minimizing a loss function to optimize the parameter θ; the loss function is:

loss(θ)=(y _(t) ^(DQN) −Q _(t)(s _(t),α_(t);θ))²

Wherein, Q_(t)(s_(t), α_(t); θ) is an output of the neural network with the parameter is θ at the state s_(t), and y_(t) ^(DQN) is an output of the target network with the parameter is {circumflex over (θ)} at the state s_(t+1);

y_(t)^(DQN) = r_(t) + γQ(s_(t + 1), a_(t + 1); θ̂)

The loss function is minimized through a gradient descent algorithm, and the action-value function is approximated by the neural network until convergence.

Further, an action-value function Q is, a target action-value function {circumflex over (θ)}=θ, an index T for iteration and an experience pool

are generated according to the random parameter θ;

For episode=1 to M do

-   -   (1) Initializing the state s_(t)     -   (2) For t=1 to T     -   a. Selecting an action by the agents according to

 = arg Q^(*)(s_(t), a_(t); θ);

-   -   b. Performing the action at by the agents to switch from the         current state s_(t) to the next state s_(t+1);     -   c. Obtaining the reward r_(t) through data exchange between the         agents and the central control unit;     -   d. Forming a tuple (s_(t), a_(t), r_(t), s_(t+1)) by s_(t),         a_(t), r_(t), s_(t+1), and saving the tuple (s_(t), a_(t),         r_(t), s_(t+1)) the experience pool         ;     -   e. Randomly selecting a mini-batch tuple (s_(t), a_(t), r_(t),         s_(t+1)) from the experience pool         ;     -   f. Calculating y_(t) ^(DQN) according to

y_(t)^(DQN) = r_(t) + γQ(s_(t + 1), a_(t + 1); θ̂);

-   -   g. Updating the parameter θ in loss(θ)=(y_(t)         ^(DQN)−Q_(t)(s_(t), α_(t); θ))² through a gradient descent         method;     -   h. Assigning θ to {circumflex over (θ)} update θ every a period         of time, that is {circumflex over (θ)}=θ;     -   i. Calculating energy efficiency EE_(t) by the central control         unit;     -   j. Calculating the reward according to r_(t)=EE_(t+1)−EE_(t),     -   Ending the cycle.

Compared with the prior art, the invention has the following advantages and effects:

-   -   1. An IRS-aided THz cellular network is constructed through         NOMA. When the BS user rate and the IRS user rate are         calculated, both adjacent band interference of the users and         in-band interference of each group of IRS users are taken into         consideration.     -   2. To maximum the energy efficiency of the system, an         optimization problem is proposed to adjust the phase angle of         IRS elements and control downlink power under the constraints of         maximum transmission power and minimum date rate.     -   3. The optimization problem is solved through the MADRL method.         Virtual agents are introduced into the BS and perform training         synchronously through the central control and periodical         exchange information. The DQN is sued to perform non-uniform         discretization on optimization variables to construct an action         space.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a model diagram of an IRS-aided NOMA THz network system according to the invention.

FIG. 2 is a schematic diagram of the interaction between agents and the environment according to one embodiment of the invention.

DETAILED DESCRIPTION OF THE INVENTION

To expound in detail the technical solutions adopted by the invention to fulfill desired technical purposes, the technical solutions of the embodiments of the invention will be clearly and completely described below in conjunction with the drawings of the embodiments of the invention. Obviously, the embodiments in the following description are merely illustrative ones, and are not all possible ones of the invention, and the technical means or technical features in the embodiments of the invention can be substituted without creative labor. The invention will be described in detail below with reference to the accompanying drawings and embodiments.

First of all, part of specialized vocabularies used in the invention will be explain:

-   -   1. IRS-aided THz communication: many researches focus on         IRS-aided communication in the THz band. Literature [1] puts         forward a method for searching for an optimal phase shift of IRS         elements to improve the system rate in the THz band. In         Literature [2], discrete phase shifts of IRSs and pre-coders are         designed at a base station (BS) to optimize spectral efficiency.         In Literature [3], a space tracking approach is developed for         channel estimation of IRS-assisted THz networks, so as to         maximum the rate of a system.

Application of NOMA to THz communication: in order to realize mass wireless connection and increase the resource utilization rate in THz communication, NOMA s applied to THz networks in recent study. In Literature [5], NOMA is applied to THz cellular networks, and a sub-channel and power allocation scheme based on an alternating direction method is proposed to optimize energy efficiency. In addition, in Literature [4], a long-term use-center window property of THz is captured, and a central sub-band and side sub-bands of a THz window are allocated to long and short NOMA groups respectively. In NOMA, power allocated to users is related to the channel gain. Small channel gains will be allocated to high-power users, and large channel gains will be allocated to low-power users [4]. NOMA can decode or demodulate superposed signals under coverage.

IRS-aided NOMA network: Under the enlightenment of capacity enhancement of NOMA and coverage increase of IRSs, IRS-aided NOMA communication has aroused the interest of researchers. Literature [6] proposes a design of IRS-aided NOMA downlink transmission, wherein channel vectors of marginal users are aligned in a preset spatial direction with the aid of IRSs. In Literature [7], the author emphatically studies an IRS-aided NOMA network and puts forward an energy-saving scheme based on joint optimization of emitted wave beam formation of a base station (BS) and reflecting phase shift of IRSs. In addition, Literature [8] proposes IRS-enhanced millimeter-wave NOMA systems and comes up with joint optimization of beam formation and power allocation. In Literature [5], the author focuses on an IRS-aided NOMA network and puts forward an energy-saving algorithm, which maximizes the energy efficiency of a system by joint optimization of the transmitted beam formation of a BS and reflecting phase shift of IRSs. Literature [6] studies an IRS-enhanced millimeter wave NOMA system and puts forward joint optimization of active beam formation, passive beam formation and power distribution. In Literature [7], the validity of IRSs in transmission power of NOMA systems is studied, and considering the constraints of minimum signal-to-interference ratio of each user, the problem of power minimization of an IRS-aided downlink NOMA system is proposed. In Literature [8], a simple design of IRS-assisted NOMA downlink transmission is put forward. The base station generates orthogonal wave beams in the spatial direction of a channel near to users by means of traditional space division multiple access; and with the aid of IRSs, valid channel vectors of marginal users are aligned in a preset spatial direction to ensure that these wave beams can serve extra marginal users.

Introduction of reinforcement learning: Literature [9]-[11] solve an optimization problem by means of reinforcement learning. Literature [9] studies a method for power allocation in multi-cell networks, which, different from traditional optimization methods, uses deep reinforcement learning (DRL) for power allocation. The objective of this article is to maximize the overall capacity of a whole network under the condition of random and dense distribution of base stations. A wireless resource mapping method and a deep neural network Deep Q-fully-connected network (DQFCNet) are provided. Compared with power allocation based on water-filling and the Q learning method, the DQFCNet can realize a higher overall capacity. Simulation results indicate that the convergence rate and stability of the DQFCNet are remarkably improved. Literature [9] solves the problem of dynamic spectrum access by means of DRL. Specifically, in this article, a scene where different types of nodes share multiple discrete channels is studied, these nodes do not have the capacity to communicate with other nodes and have no prior knowledge about the behaviors of other nodes. The objective of each node is to maximize the long-term transmission succeed rate of its own. This problem is expressed as a Markov decision process (MDP) with unknown system dynamics. To overcome the challenge of an unknown environment and a large transition matrix, two specific DRL methods are used: deep Q network (DQN) and double deep Q network (DDQN). In addition, improved DQN techniques, including qualification tracking, prior experience and “prediction process”, are introduced. Simulation results indicate that the DQN and the DDQN can effectively learn communication modes of different nodes without prior knowledge and fulfill approximately the optimal performance. Literature [11] points out that complete system observability is necessary for optimizing radio transmission power and user data rate in wireless systems. Although this issue has been widely studied in this literature, there is still no practical solution for approaching the optimal performance merely by means of the observability of available parts of an actual system. The invention provides a reinforcement learning method for realizing downlink power control and rate adaptation in a cellular network to overcome this defect. The invention puts forward the design of a comprehensive learning framework, including a system state, a common reward function and an effective learning algorithm. System-level simulation results show that this design learns a power control strategy rapidly, fulfills a remarkable energy-saving effect and guarantees the fairness of users in a system.

As shown in FIG. 1 , the invention provides an energy efficiency optimization method for an IRS-assisted NOMA THz network, comprising the following steps:

Step 1: users are classified into BS users and IRS users.

N_(B) antennas are configured for a base station, N_(U) antennas are configured for users, and the users are classified into BS users and IRS users; assume the number of the BS users is L, the BS users are represented by a set

={1, 2, . . . , L}; the IRS users are divided into M clusters, wherein each cluster comprises K users and is served by G IRS elements, and

={1, 2, . . . , M}

={1, 2, . . . G},

={1, 2, . . . , K}; a bandwidth of the system is divided into multiple sub-channels, wherein each BS user and each IRS user respectively use one sub-channel, and assume the BS users use the first L sub-channels, IRS users use the remaining sub-channels.

Step 2: a channel model for the BS users and a channel model for the IRS users are defined.

The channel model for the BS users is specifically as follows:

Considering that a THz channel from a BS to users is modeled into a LoS path with the neglect of the reflected, scattered and diffracted fading due to severe attenuation of THz; a channel gain from the BS to a user l at a sub-channel n is expressed as:

$h_{l,n}^{B} = \sqrt{\frac{1}{{PL}\left( {f_{n},d_{l}} \right)}}$

Wherein, PL(f_(n), d_(l)) is a path loss of the THz LoS path, and f_(n) and d_(l) are a THz frequency and a distance between the BS and the user; the path loss of the THz LoS path is formed by two parts, of which one is a free space spreading loss and the other is a molecular absorption loss, with an expression as:

PL(f _(n) ,d _(l))=L _(spread)(f _(n) ,d _(l))×L _(abs)(f _(n) ,d _(l))

Where, L_(spread)(f_(n), d_(l)) and L_(abs)(f_(n), d_(l)) meet

${{L_{spread}\left( {f_{n},d_{l}} \right)} = \left( \frac{4\pi f_{n}d_{l}}{c} \right)^{2}}{{L_{abs}\left( {f_{n},d_{l}} \right)} = e^{{- {k_{abs}(f_{n})}}d_{l}}}$

Where, c represents a speed of light, and k_(abs)(f_(n)) represents molecular absorption coefficient;

Assume power transmitted to the user l through the sub-channel n is P_(l, n) ^(B), a received signal is:

$y_{l,n}^{B} = {{h_{l,n}^{B}p_{l,n}^{B}} + {h_{l,n}^{B}{\sum\limits_{{l^{\prime} = 1},{l^{\prime} \neq l}}^{L}{\sum\limits_{{n^{\prime} = {n - 1}},{n^{\prime} \neq n}}^{n + 1}p_{l^{\prime},n^{\prime}}^{B}}}} + \sigma^{2}}$

Where, σ² is additive white Gaussian noise power, and p_(l′,n′) ^(B) is power transmitted to a user l′ through the sub-channel n.

The channel model for the IRS users is specifically as follows:

-   -   A channel for the IRS users is composed of a channel from the BS         to an IRS, a channel from the IRS to the users, and a phase         shift of IRS elements; according to a classical S-V model,         assume a channel vector reflected by an IRS i to a k^(th) user         in an m^(th) cluster is defined as:

H=H ^(I) ΦH ^(B)

Where, H^(B) represents channel attenuation from the BS to the IRS, H^(I) represents channel attenuation from the IRS to the users; Φ is a G×G diagonal matrix, represents the phase shift of the IRS elements and meets Φ=diag([e^(jφ) ¹ , . . . , e^(jφ) ^(G) ]), wherein φ_(g) represents the phase shift of a g^(th) element; H^(B) is expressed as:

H ^(B) =A ^(I) ¹ diag(α)A ^(B)*

Wherein

α=√{square root over (N _(B) G/L ₁)}[α₁, . . . ,α_(l) ₁ , . . . ,α_(L) ₁ ]*

A ^(B)=[α^(B)(ϕ₁), . . . ,α^(B)(ϕ_(l) ₁ ), . . . ,α^(B)(ϕ_(L) ₁ )]

A ^(I) ¹ =[α^(l) ¹ (γ₁ ^(A)), . . . ,α^(l) ¹ (γ_(l) ₁ ^(A)), . . . ,α^(l) ¹ (γ_(L) ₁ ^(A))]

Where, L₁ represents the number of scattering paths from the BS to the IRS, α_(l) ₁ is a complex gain from the path loss of a path l₁, ϕ_(l) ₁ ∈[0, 2π] and γ_(l) ₁ ^(A)∈[0, 2π] represent a departure angle and an arrival angle on the path l₁ from the BS to the IRS; here, considering uniform linear array are considered, and α^(B)(ϕ_(l) ₁ ) and α^(l) ¹ (γ_(l) ₁ ^(A)) represent array response vectors at the BS and the IRS and are expressed

${{\alpha^{B}\left( \phi_{l_{1}} \right)} = {\frac{1}{\sqrt{N_{B}}}\left\lbrack {1,e^{{j({2\pi/\lambda})}d{\sin(\phi_{l_{1}})}},\ldots,e^{{j({N_{B} - 1})}{({2\pi/\lambda})}d{\sin(\phi_{l_{1}})}}} \right\rbrack}^{*}}{{\alpha^{l_{1}}\left( \gamma_{l_{1}}^{A} \right)} = {\frac{1}{\sqrt{G}}\left\lbrack {1,e^{{j({2\pi/\lambda})}d{\sin(\gamma_{l_{1}}^{A})}},\ldots,e^{{j({N_{B} - 1})}{({2\pi/\lambda})}d{\sin(\gamma_{l_{1}}^{A})}}} \right\rbrack}^{*}}$

Where, λ is a wavelength of THz signals, and d is a distance between adjacent antenna elements or IRS elements;

Similar to a BS-IRS link, a IRS-user channel is formulated as:

H ^(I) =A ^(U)diag(β)A ^(I) ² *

Wherein,

β=√{square root over (N _(U) G/L ₂)}[β₁, . . . ,β_(l) ₁ , . . . ,β_(L) ₁ ]*

A ^(U)=[α^(U)(ψ₁), . . . ,α^(U)(ψ_(l) ₂ ), . . . ,α^(U)(ψ_(L) ₂ )]

A ^(I) ² =[α^(I) ² (γ₁ ^(D)), . . . ,α^(I) ² (γ_(l) ₂ ^(D)), . . . ,α^(I) ² (γ_(L) ₂ ^(D))]

Where, L₂ represent the number of scattering paths from the BS to the IRS, ψ_(l) ₂ is a complex gain from the path loss of a path l₂, and ψ_(l) ₂ ∈[0, 2π] and γ_(l) ₂ ^(D)∈[0, 2π] represent a departure angle and an arrival angle on the path l₂ from the IRS to the BS; here, uniform linear arrays are considered, and a α^(U)(ψ_(l) ₂ ) and α^(I) ² (γ_(l) ₂ ^(D)) are expressed as:

${{\alpha^{U}\left( \phi_{l_{2}} \right)} = {\frac{1}{\sqrt{N_{U}}}\left\lbrack {1,e^{{j({2\pi/\lambda})}d{\sin(\phi_{l_{2}})}},\ldots,e^{{j({N_{U} - 1})}{({2\pi/\lambda})}d{\sin(\phi_{l_{2}})}}} \right\rbrack}^{*}}{{\alpha^{l_{2}}\left( \gamma_{l_{2}}^{D} \right)} = {\frac{1}{\sqrt{G}}\left\lbrack {1,e^{{j({2\pi/\lambda})}d{\sin(\gamma_{l_{2}}^{D})}},\ldots,e^{{j({G - 1})}{({2\pi/\lambda})}d{\sin(\gamma_{l_{2}}^{D})}}} \right\rbrack}^{*}}$

So, IRS-user channel is:

H=A ^(U) _(diag)(β)A ^(I) ² *ΦA ^(I) ¹ diag(α)A ^(B)*

For the sake of brevity, assume N_(B)=1 and N_(U)=1, the vector H is composed of a vector representing a channel gain h_(i,m,k,n) ^(I) of the k^(th) user in the m^(th) cluster on the sub-channel n; P_(i,m,k,n) ^(I) represents power transmitted to the k^(t)n user in the m^(th) cluster on the sub-channel n; a signal received by the k^(th) user in the m^(th) cluster on the sub-channel n is expressed as:

$y_{i,m,k,n}^{I} = {{h_{i,m,k,n}^{I}p_{i,m,k,n}^{I}} + {h_{i,m,k,n}^{I}{\sum_{{n^{\prime} = {n + 1}},{n^{\prime} \neq n}}^{n^{\prime} = {n + 1}}p_{i,m,k,n}^{I}}} + {h_{i,m,k,n}^{I}{\sum_{k^{\prime} = 1}^{k^{\prime} = {k - 1}}p_{i,m,k,n}^{I}}} + \sigma^{2}}$

Step 3: a BS user rate and an IRS user rate are calculated respectively, and a total rate of a system is calculated.

The BS user rate is calculated:

Wherein, a signal to noise ratio for signal reception of a BS user l is:

${SINR}_{l,n}^{B} = \frac{h_{l,n}^{B}p_{l,n}^{B}}{{h_{l,n}^{B}{\sum_{{l^{\prime} = 1},{l^{\prime} \neq l}}^{L}{\sum_{{n^{\prime} = {n - 1}},{n^{\prime} \neq n}}^{n + 1}p_{l^{\prime},n^{\prime}}^{B}}}} + \sigma^{2}}$

-   -   By a Shannon equation, a rate of the user l is expressed as:

$R_{l}^{B} = {B{\sum\limits_{n = 1}^{L}{\log_{2}\left( {1 + {SINR}_{l,n}^{B}} \right)}}}$

Where, B is a bandwidth;

The IRS user rate is calculated:

Wherein, a signal to noise ratio of the k^(th) user in the m^(th) cluster is:

${SINR}_{i,m,k,n}^{I} = {{\frac{h_{i,m,k,n}^{I}p_{i,m,k,n}^{I}}{{h_{i,m,k,n}^{I}{\sum_{{n^{\prime} = {n - 1}},{n^{\prime} \neq n}}^{n^{\prime} = {n + 1}}p_{i,m,k,n}^{I}}} + {h_{i,m,k,n}^{I}\sum_{k^{\prime} = 1}^{k^{\prime} = {k - 1}}}}p_{i,m,k,n}^{I}} + \sigma^{2}}$

The rate is expressed as:

R _(i,m,k) ^(I) =BΣ _(n=1) ^(L+IMK) log₂(1+SINR_(i,m,k,n) ^(I))

The total rate of the system is expressed as:

R=Σ _(l=1) ^(L) R _(l) ^(B)+Σ_(i=1) ^(I)Σ_(m=1) ^(M)Σ_(k=1) ^(K) R _(i,m,k) ^(I)

Step 4: an optimization problem for downlink power control and IRS phase shift adjustment is proposed.

To maximize overall energy efficiency of a network, the optimization problem for downlink power control and IRS phase shift adjustment is proposed, wherein total transmission power of the BS is calculated as the sum power of all the users by: The optimization problem of downlink power control and IRS phase shift modulation is proposed to maximize overall energy efficiency of a network, wherein total transmission power of the BS is a sum of power of all the users and is expressed as:

$P = {{\sum\limits_{l = 1}^{L}{\sum\limits_{n = 1}^{L}p_{l,n}^{B}}} + {\sum\limits_{i = 1}^{I}{\sum\limits_{m = 1}^{I}{\sum\limits_{k = 11}^{K}{\sum\limits_{n = {L + 1}}^{L + {IMK}}p_{i,m,k,n}^{I}}}}}}$

The energy efficiency of the system network is defined as a ratio of a sum rate to the total power of the network, and the optimization problem is formulated as:

${{\max\limits_{\varphi_{i,m,g},p_{i,m,k,n}^{I},p_{l,n}^{B}}{EE}} = \frac{R}{P}}{{{C_{1}:0} < p_{i,m,k,n}^{I} < P_{T}},{\forall{i \in}},{\forall{m \in}},{\forall{k \in}},{\forall{n \in}}}{{{C_{2}:0} < p_{l,n}^{B} < P_{T}},{\forall{l \in \mathcal{L}}},{\forall{n \in \mathcal{N}}}}{{{C_{3}:R_{i,m,k}^{I}} \geq R_{\min}},{\forall{i \in \mathcal{I}}},{\forall{m \in \mathcal{M}}},{\forall{k \in \mathcal{K}}}}{{{C_{4}:R_{l}^{B}} \geq R_{\min}},{\forall{l \in \mathcal{L}}}}{{{C_{5}:\varphi_{i,m,g}} \in \left\lbrack {0,{2\pi}} \right\rbrack},{\forall{i \in}},{\forall{m \in}},{\forall{g \in}}}$

Where, C₁ and C₂ are power limitations of each user, C₃ and C₄ are minimum rate requirements, and C₅ is an angle range.

Step 5: the optimization problem is solved through an MADRL method.

The optimization problem is solved through the MADRL method: virtual agents are introduced into the BS as mappings of the users and perform training to obtain optimal power and phase shift; a central control unit is configured on the BS to collect user information including channel state information (CSI), phase shift and power; setting a clock to ensure synchronous iteration during agent training, so that overall energy efficiency is calculated after each iteration; and the agents perform training according to the collected user information and real-time iteration results to realize global optimization.

A Markov process taking a discrete time, a finite state space and an action space into account is used for training; basic elements of reinforcement learning are represented by a tuple (

,

,

,

), where

represents a state space,

represents an action space,

represents a reward function, and

represents a state transition probability; and the state space and the action space are set as follows:

-   -   1) State space: a tuple (φ, p) is defined to represent an angle         of the IRS elements and the power of the BS users and the IRS         users, and the state space is expressed by a formula as         ={s|s=(φ, p)}, wherein φ={φ₁, . . . , φ_(j), . . . , φ_(G)};     -   2) Action space: in order to obtain a finite space, the angle         and the power are discretized by;

${\varphi_{j}:\left\{ {0,\left\{ {{{{\varphi_{\min}\left( \frac{\varphi_{\max}}{\varphi_{\min}} \right)}^{\frac{i}{{❘\varphi ❘} - 2}}❘i} = 0},\ldots,{{❘\varphi ❘} - 2}} \right\}} \right\}}{\varphi:\left\{ {0,\left\{ {{{{P_{\min}\left( \frac{P_{\max}}{P_{\min}} \right)}^{\frac{i}{{❘P❘} - 2}}❘i} = 0},\ldots,{{❘P❘} - 2}} \right\}} \right\}}$

Wherein, φ_(min) and φ_(max) are a minimum phase and a maximum phase of the IRS elements, P_(min) and P_(max) are minimum user power and maximum user power, and a discrete quantity of the angle and a discrete quantity of the power are |φ| and |P| respectively; the action space is formed as

={a|a=(φ, p)};

3) Reward space: a difference between the overall energy efficiency in a current state and the overall energy efficiency in a previous state is defined as a reward, which is presented as:

-   -   =EE_(t+1)−EE_(t), wherein EE_(t+1) and EE_(t) are energy         efficiency in a state s_(t+1) and energy efficiency in a state         s_(t) respectively;     -   An optimal strategy π is obtained by the agents to realize a         maximum cumulative reward, which is obtained by:

$R_{t}\overset{\Delta}{=}{\sum\limits_{t = 0}^{\infty}{\gamma^{t}r_{t + 1}}}$

Where, γ∈(0,1] is a discount factor for future rewards;

During training, the agents select an action according to the optimal strategy π; at the state s_(t), the agents take an action α_(t) according to the optimal strategy π, and at this moment, an action-value function Q_(π)(s_(t), α_(t)) of the agents is expressed as:

Q _(π)(s _(t),α_(t))

E _(π) [R _(t) |s=s _(t),α=α_(t)]

According to a Bellman equation,

${Q^{*}\left( {s,a} \right)}\overset{\Delta}{=}{E\left\lbrack {{{{r_{t} + {\gamma\max\limits_{a^{\prime}}{Q^{*}\left( {s^{\prime},a^{\prime}} \right)}}}❘s} = s_{t}},{a = a_{t}}} \right\rbrack}$

An evaluation of the optimal strategy is expressed as:

${Q^{*}\left( {s,a} \right)}\overset{\Delta}{=}{Q^{\pi}\left( {s,a} \right)}$

The optimal strategy is obtained by:

π^(*) = arg Q^(*)(s, a)

To search an optimal strategy in a large state space and a large action space, a DQN is introduced into MADRL; the optimal strategy and the value function are approximated as a function according to Q_(i)(s, α; θ)≠Q*(s, α), where θ is a weight and is updated by training; the DQN comprises a target network and a current network, which are trained by minimizing a loss function to optimize the parameter θ; the loss function is:

loss(θ)=(y _(t) ^(DQN) −Q _(t)(s _(t),α_(t);θ))²

Wherein, Q_(t)(s_(t), α_(t); θ) is an output of the neural network with the parameter is θ at the state s_(t), and y_(t) ^(DQN) is an output of the target network with the parameter is {circumflex over (θ)} at the state s_(t+1);

y_(t)^(DQN) = r_(t) + γQ(s_(t + 1), a_(t + 1); θ̂)

The loss function is minimized through a gradient descent algorithm, and the action-value function is approximated by the neural network until convergence.

An action-value function Q is, a target action-value function {circumflex over (θ)}=θ, an index T for iteration and an experience pool

are generated according to the random parameter θ;

For episode=1 to M

-   -   1) Initializing the state s_(t)     -   2) For t=1 to T     -   a. Selecting an action by the agents according to

 = arg Q^(*)(s_(t), a_(t); θ);

-   -   b. Performing the action t by the agents to switch from the         current state s_(t) to the next state s_(t+1);     -   c. Obtaining the reward r_(t) through data exchange between the         agents and the central control unit;     -   d. Forming a tuple (s_(t), α_(t), r_(t), s_(t+1)) by s_(t),         α_(t), r_(t), s_(t+1), and saving the tuple (s_(t), α_(t),         r_(t), s_(t+1)) the experience pool         ;     -   e. Randomly selecting a mini-batch tuple (s_(t), α_(t), r_(t),         s_(t+1)) from the experience pool         ;     -   f. Calculating y_(t) ^(DQN) according to

y_(t)^(DQN) = r_(t) + γQ(s_(t + 1), a_(t + 1); θ̂);

-   -   g. Updating the parameter θ in loss(θ)=(y_(t)         ^(DQN)−Q_(t)(s_(t), α_(t); θ))² through a gradient descent         method;     -   h. Assigning θ to {circumflex over (θ)} to update θ every a         period of time, that is {circumflex over (θ)}=θ;     -   i. Calculating energy efficiency EE_(t) by the central control         unit;     -   j. Calculating the reward according to r_(t)=EE_(t+1)−EE_(t);     -   Ending the cycle.

The above embodiments are merely preferred ones of the invention, and are not intended to limit the invention in any form. Although the invention has been disclosed above with reference to the preferred embodiments, these embodiments are not used to limit the invention. Any skilled in the art can obtain equivalent embodiments by slightly changing or modifying the technical contents disclosed above without departing from the scope of the technical solutions of the invention. Any simple amendments, equivalent substitutions and improvements made to the above embodiments based on the spirit and principle of the invention according to the technical essence of the invention should still fall within the protection scope of the technical solutions of the invention.

Literature list in this application:

-   [1] W. Chen, X. Ma, Z. Li, and N. Kuang, “Sum-rate maximization for     intelligent reflecting surface based terahertz communication     systems,” IEEE Int. Conf. Commun., pp. 153-157, August 2019. -   [2] W. Chen, Z. Chen, X. Ma, Y. Chi, and Z. Li, “Spectral efficiency     optimization for intelligent reflecting surface aided multi-input     multioutput terahertz system,” Microwave and Optical Technology     Lett., vol. 62, no. 8, pp. 2754-2759, August 2020. -   [3] X. Ma, Z. Chen, W. Chen, Z. Li, Y. Chi, C. Han, and S. Li,     “Joint channel estimation and data rate maximization for intelligent     reflecting surface assisted terahertz MIMO communication systems,”     IEEE Access, vol. 8, pp. 99565-99581, August 2020. -   [4] X. Zhang, C. Han, and X. Wang, “Joint     beamforming-power-bandwidth allocation in terahertz NOMA networks,”     Int. Conf. on Sensing, Commun., and Netw., pp. 1-9, June 2019. -   [5] H. Zhang, Y. Duan, K. Long, and V. C. M. Leung, “Energy     efficient resource allocation in terahertz downlink NOMA systems,”     IEEE Trans. Commun., vol. 69, no. 2, pp. 1375-1384, February 2021. -   [6] Z. Ding and H. V. Poor, “A simple design of IRS-NOMA     transmission,” I EEE Commun. Lett., vol. 24, no. 5, pp. 1119-1123,     May 2020. -   [7] F. Fang, Y. Xu, Q. Pham, and Z. Ding, “Energy-efficient design     of IRS-NOMA networks,” IEEE Trans. Veh. Technol., vol. 69, no. 11,     pp. 14088-14092, November 2020. -   [8] J. Zuo, Y. Liu, E. Basar, and O. A. Dobre, “Intelligent     reflecting surface enhanced millimeter-wave NOMA systems,” IEEE     Commun. Lett., vol. 24, no. 11, pp. 2632-2636, November 2020. -   [9] Y. Zhang, C. Kang, T. Ma, Y. Teng, and D. Guo, “Power allocation     in multi-cell networks using deep reinforcement learning,” IEEE Veh.     Technol. Conf., pp. 1-6, August 2018. -   [10] Y. Xu, J. Yu, W. C. Headley, and R. M. Buehrer, “Deep     reinforcement learning for dynamic spectrum access in wireless     networks,” IEEE Military Commun. Conf., pp. 207-212, October 2018. -   [11] E. Ghadimi, F. D. Calabrese, G. Peters, and P. Soldati, “A     reinforcement learning approach to power control and rate adaptation     in cellular networks,” IEEE Int. Conf. Commun., pp. 1-7, May 2017. 

What is claimed is:
 1. An energy efficiency optimization method for an IRS-assisted NOMA THz network, comprising the following steps: Step 1: classifying users into BS users and IRS users; Step 2: defining a channel model for the BS users and a channel model for the IRS users; Step 3: calculating a BS user rate and an IRS user rate respectively, and calculating a total rate of a system; Step 4: proposing an optimization problem for downlink power control and IRS phase shift adjustment; and Step 5: solving the optimization problem through an MADRL method.
 2. The energy efficiency optimization method for an IRS-assisted NOMA THz network according to claim 1, wherein in Step 1, N_(B) antennas are configured for a base station, N_(U) antennas are configured for users and the users are classified into BS users and IRS users; assume the number of the BS users is L, the BS users is represented by a set

={1, 2 . . . , L}; the IRS users are divided into M clusters, wherein each cluster comprises K users and is served by G IRS elements, and

={1, 2, . . . M}

={1, 2, . . . G},

{1, 2, . . . K}; a bandwidth of the system is divided into multiple sub-channels, wherein each BS user and each IRS user respectively use one sub-channel, and assume the BS users use the first L sub-channels, IRS users use the remaining sub-channels.
 3. The energy efficiency optimization method for an IRS-assisted NOMA THz network according to claim 1, wherein in Step 2, the channel model for the BS users is specifically as follows: considering that a THz channel from a BS to users is modeled into a LoS path with the neglect of the reflected, scattered and diffracted fading due to severe attenuation of THz; a channel gain from the BS to a user l at a sub-channel n is expressed as: $h_{l,n}^{B} = \sqrt{\frac{1}{{PL}\left( {f_{n},d_{l}} \right)}}$ wherein, PL(f_(n), d_(l)) is a path loss of the THz LoS path, and f_(n) and d_(l) are a THz frequency and a distance between the BS and the user; the path loss of the THz LoS path is formed by two parts, of which one is a free space spreading loss and the other is a molecular absorption loss, with an expression as: PL(f _(n) ,d _(l))=L _(spread)(f _(n) ,d _(l))×L _(abs)(f _(n) ,d _(l)) where, L_(spread) (f_(n), d_(l)) and L_(abs)(f_(n), d_(l)) meet: ${{L_{spread}\left( {f_{n},d_{l}} \right)} = \left( \frac{4\pi f_{n}d_{l}}{c} \right)^{2}}{{L_{abs}\left( {f_{n},d_{l}} \right)} = e^{{- {k_{abs}(f_{n})}}d_{l}}}$ where, c represents a speed of light, and k_(abs)(f_(n)) represents molecular absorption coefficient; assume power transmitted to the user i through the sub-channel n is p_(l,n) ^(B), a received signal is: $y_{l,n}^{B} = {{h_{l,n}^{B}p_{l,n}^{B}} + {h_{l,n}^{B}{\sum\limits_{{l^{\prime} = 1},{l^{\prime} \neq l}}^{L}{\sum\limits_{{n^{\prime} = {n - 1}},{n^{\prime} \neq n}}^{n + 1}p_{l^{\prime},n^{\prime}}^{B}}}} + \sigma^{2}}$ where, σ² is additive white Gaussian noise power, and p_(l′,n′) ^(B) is power transmitted to a user l′ through the sub-channel n.
 4. The energy efficiency optimization method for an IRS-assisted NOMA THz network according to claim 3, wherein in Step 2, the channel model for the IRS users is specifically as follows: a channel for the IRS users is composed of a channel from the BS to an IRS, a channel from the IRS to the users, and a phase shift of IRS elements; according to a classical S-V model, assume a channel vector reflected by an IRS i to a k^(th) user in an m^(th) cluster is defined as: H=H ^(I) ΦH ^(B) where, H^(B) represents channel attenuation from the BS to the IRS, H^(I) represents channel attenuation from the IRS to the users; Φ is a G×G diagonal matrix, represents the phase shift of the IRS elements and meets Φ=diag([e^(jφ) ¹ , . . . , e^(jφ) ^(G) ]), where φ_(g) represents the phase shift of a g^(th) element; H^(B) is expressed as: H ^(B) =A ^(l) ¹ diag(α)A ^(B)* where α=√{square root over (N _(B) G/L ₁)}[α₁, . . . ,α_(l) ₁ , . . . ,α_(L) ₁ ]* A ^(B)=[α^(B)(ϕ₁), . . . ,α^(B)(ϕ_(l) ₁ ), . . . ,α^(B)(ϕ_(L) ₁ )] A ^(I) ¹ =[α^(I) ¹ (γ₁ ^(A)), . . . ,α^(I) ¹ (γ_(l) ₁ ^(A)), . . . ,α^(I) ¹ (γ_(L) ₁ ^(A))] Where, L₁ represents the number of scattering paths from the BS to the IRS, α_(l) ₁ is a complex gain from the path loss of a path l₁, ϕ_(l) ₁ ∈[0, 2π] and γ_(l) ₁ ^(A)∈[0, 2π] represent a departure angle and an arrival angle on the path l₁ from the BS to the IRS; here, uniform linear array are considered, and α^(B)(ϕ_(l) ₁ ) and α^(I) ² (γ_(l) ₁ ^(A)) represent array response vectors at the BS and the IRS and are expressed as: ${{\alpha^{B}\left( \phi_{l_{1}} \right)} = {\frac{1}{\sqrt{N_{B}}}\left\lbrack {1,e^{{j({2\pi/\lambda})}d{\sin(\phi_{l_{1}})}},\ldots,e^{{j({N_{b} - 1})}{({2\pi/\lambda})}d{\sin(\phi_{l_{1}})}}} \right\rbrack}^{*}}{{\alpha^{I_{1}}\left( \gamma_{l_{1}}^{A} \right)} = {\frac{1}{\sqrt{G}}\left\lbrack {1,e^{{j({2\pi/\lambda})}d{\sin(\gamma_{l_{1}}^{A})}},\ldots,e^{{j({G - 1})}{({2\pi/\lambda})}d{\sin(\gamma_{l_{1}}^{A})}}} \right\rbrack}^{*}}$ where, λ is a wavelength of THz signals, and d is a distance between adjacent antenna elements or IRS elements; similar to a BS-IRS link, a IRS-user channel is formulated as: H ^(I) =A ^(U)diag(β)A ^(I) ² * where, β=√{square root over (N _(U) G/L ₂)}[β₁, . . . ,β_(l) ₁ , . . . ,β_(L) ₁ ]* A ^(U)=[α^(U)(ψ₁), . . . ,a ^(U)(ψ_(l) ₂ ), . . . ,a ^(U)(ψL ₂)] A ^(I) ² =[α^(I) ² (γ₁ ^(D)), . . . ,α^(I) ² (γ_(l) ₂ ^(D)), . . . ,α^(I) ² (γ_(L) ₂ ^(D))] where, L₂ represent the number of scattering paths from the BS to the IRS, ψ_(l) ₂ is a complex gain from the path loss of a path l₂, and ψ_(l) ₂ ∈[0, 2π] and γ_(l) ₂ ^(D)∈[0, 2π] represent a departure angle and an arrival angle on the path l₂ from the IRS to the BS; here, uniform linear arrays are considered, and α^(U)(ψ_(l) ₂ ) and α^(I) ² (γ_(l) ₂ ^(D)) are expressed as: ${{\alpha^{U}\left( \psi_{l_{2}} \right)} = {\frac{1}{\sqrt{N_{U}}}\left\lbrack {1,e^{{j({2\pi/\lambda})}d{\sin(\psi_{l_{2}})}},\ldots,e^{{j({N_{U} - 1})}{({2\pi/\lambda})}d{\sin(\psi_{l_{2}})}}} \right\rbrack}^{*}}{{\alpha^{l_{2}}\left( \gamma_{l_{2}}^{D} \right)} = {\frac{1}{\sqrt{G}}\left\lbrack {1,e^{{j({2\pi/\lambda})}d{\sin(\gamma_{l_{2}}^{D})}},\ldots,e^{{j({G - 1})}{({2\pi/\lambda})}d{\sin(\gamma_{l_{2}}^{D})}}} \right\rbrack}^{*}}$ so, IRS-user channel is: H=A ^(U)diag(β)A ^(I) ² *ΦA ^(I) ¹ diag(α)A ^(B)* for the sake of brevity, assume N_(B)=1 and N_(U)=1, the vector H is composed of a vector representing a channel gain h_(i,m,k,n) ^(I) of the k^(th) user in the m^(th) cluster on the sub-channel n; P_(i,m,k,n) ^(I) represents power transmitted to the k^(th) user in the m^(th) cluster on the sub-channel n; a signal received by the k^(th) user in the m^(th) cluster on the sub-channel n is expressed as: $y_{i,m,k,n}^{I} = {{h_{i,m,k,n}^{I}p_{i,m,k,n}^{I}} + {h_{i,m,k,n}^{I}{\sum_{{n^{\prime} = {n - 1}},{n^{\prime} \neq n}}^{n^{\prime} = {n + 1}}p_{i,m,k,n}^{I}}} + {h_{i,m,k,n}^{I}{\sum_{k^{\prime} = 1}^{k^{\prime} = {k - 1}}p_{i,m,k,n}^{I}}} + \sigma^{2}}$
 5. The energy efficiency optimization method for an IRS-assisted NOMA THz network according to claim 4, wherein in Step 3, the BS user rate is calculated: wherein, a signal to noise ratio for signal reception of a BS user l is: ${SINR}_{l,m}^{B} = \frac{h_{l,n}^{B}p_{l,n}^{B}}{{h_{l,n}^{B}{\sum_{{l^{\prime} = 1},{l^{\prime} \neq l}}^{L}{\sum_{{n^{\prime} = {n - 1}},{n^{\prime} \neq n}}^{n + 1}p_{l^{\prime},n^{\prime}}^{B}}}} + \sigma^{2}}$ by a Shannon equation, a rate of the user l is expressed as: $R_{l}^{B} = {B{\sum\limits_{n = 1}^{L}{\log_{2}\left( {1 + {SINR}_{l,n}^{B}} \right)}}}$ where, B is a bandwidth; the IRS user rate is calculated: wherein, a signal to noise ratio of the k^(th) user in the m^(th) cluster is: ${SINR}_{i,m,k,n}^{I} = \frac{h_{i,m,k,n}^{I}p_{i,m,k,n}^{I}}{{h_{i,m,k,n}^{I}{\sum_{{n^{\prime} = {n - 1}},{n^{\prime} \neq n}}^{n^{\prime} = {n + 1}}p_{i,m,k,n}^{I}}} + {h_{i,m,k,n}^{I}{\sum_{k^{\prime} = 1}^{k^{\prime} = {k - 1}}p_{i,m,k,n}^{I}}} + \sigma^{2}}$ the rate is expressed as: R _(i,m,k) ^(I) =BΣ _(n=1) ^(L+IMK) log₂(1+SINR_(i,m,k,n) ^(I)) the total rate of the system is expressed as: R=Σ _(l=1) ^(L) R _(l) ^(B)+Σ_(i=1) ^(I)Σ_(m=1) ^(M)Σ_(k=1) ^(K) R _(i,m,k) ^(I)
 6. The energy efficiency optimization method for an IRS-assisted NOMA THz network according to claim 5, wherein in Step 4, to maximize overall energy efficiency of a network, the optimization problem for downlink power control and the IRS phase shift adjustment is proposed, wherein total transmission power of the BS is calculated as the sum power of all the users by: $P = {{\sum\limits_{l = 1}^{L}{\sum\limits_{n = 1}^{L}p_{l,n}^{B}}} + {\sum\limits_{i = 1}^{I}{\sum\limits_{m = 1}^{M}{\sum\limits_{k = 1}^{K}{\sum\limits_{n = {L + 1}}^{L + {IMK}}p_{i,m,k,n}^{I}}}}}}$ the energy efficiency of the system network is defined as a ratio of a sum rate to the total power of the network, and the optimization problem is formulated as: ${{\max\limits_{\varphi_{i,m,g},p_{i,m,k,n}^{I},p_{l,n}^{B}}{EE}} = \frac{R}{P}}{{{C_{1}:0} < p_{i,m,k,n}^{I} < P_{T}},{\forall{i \in}},{\forall{m \in}},{\forall{k \in}},{\forall{n \in}}}{{{C_{2}:0} < p_{l,n}^{B} < P_{T}},{\forall{l \in \mathcal{L}}},{\forall{n \in \mathcal{N}}}}{{{C_{3}:R_{i,m,k}^{I}} \geq R_{\min}},{\forall{i \in \mathcal{I}}},{m \in \mathcal{M}},{k \in \mathcal{K}}}{{{C_{4}:R_{l}^{B}} \geq R_{\min}},{\forall{l \in \mathcal{L}}}}{{{C_{5}:\varphi_{i,m,g}} \in \left\lbrack {0,{2\pi}} \right\rbrack},{\forall{i \in \mathcal{I}}},{\forall{m \in \mathcal{M}}},{\forall{g \in \mathcal{G}}}}$ where, C₁ and C₂ are power limitations of each user, C₃ and C₄ are minimum rate requirements, and is an angle range.
 7. The energy efficiency optimization method for an IRS-assisted NOMA THz network according to claim 6, wherein in Step 5, the optimization problem is solved through the MADRL method: virtual agents are introduced into the BS as mappings of the users, and the virtual agents perform training to obtain optimal power and phase shift; a central control unit is configured on the BS to collect user information including channel state information (CSI), phase shift and power; a clock is set to ensure synchronous iteration during agent training, so that overall energy efficiency is calculated after each iteration; and the agents perform training according to the collected user information and real-time iteration results to realize global optimization.
 8. The energy efficiency optimization method for an IRS-assisted NOMA THz network according to claim 7, wherein a Markov process taking a discrete time, a finite state space and an action space into account is used for training; basic elements of reinforcement learning are represented by a tuple (

,

,

,

), where

represents a state space,

represents an action space,

represents a reward function, and

represents a state transition probability; and the state space and the action space are set as follows: 1) state space: a tuple (φ, p) is defined to represent an angle of the IRS elements and the power of the BS users and the IRS users, and the state space is expressed by a formula

={s|s=(φ, p)}, wherein φ={φ₁, . . . , φ_(j), . . . , φ_(G)}; 2) action space: in order to obtain the finite space, the angle and the power are discretized by; ${\varphi_{j}:\left\{ {0,\left\{ {{{{\varphi_{\min}\left( \frac{\varphi_{\max}}{\varphi_{\min}} \right)}^{\frac{i}{{❘\varphi ❘} - 2}}❘i} = 0},\ldots,{{❘\varphi ❘} - 2}} \right\}} \right\}}{\varphi:\left\{ {0,\left\{ {{{{P_{\min}\left( \frac{P_{\max}}{P_{\min}} \right)}^{\frac{i}{{❘P❘} - 2}}❘i} = 0},\ldots,{{❘P❘} - 2}} \right\}} \right\}}$ wherein, φ_(min) and φ_(max) are a minimum phase and a maximum phase of the IRS elements, P_(min) and P_(max) are minimum user power and maximum user power, and a discrete quantity of the angle and a discrete quantity of the power are |φ| and |P| respectively; the action space is formed as

={a|a=(φ, p)}; 3) reward space: a difference between the overall energy efficiency in a current state and the overall energy efficiency in a previous state is defined as a reward, which is presented as:

=EE_(t+1)−EE_(t), wherein EE_(t+1) and EE_(t) are energy efficiency in a state s_(t+1) and energy efficiency in a state s_(t) respectively; an optimal strategy π is obtained by the agents to realize a maximum cumulative reward, which is obtained by: $R_{t}\overset{\Delta}{=}{\sum\limits_{t = 0}^{\infty}{\gamma^{t}r_{t + 1}}}$ where, γ∈(0,1] is a discount factor for future rewards; during training, the agents select an action according to the optimal strategy π; at the state s_(t), the agents take an action α_(t) according to the optimal strategy π, and at this moment, an action-value function Q_(π)(s_(t), α_(t)) of the agents is expressed as: Q _(π)(s _(t),α_(t))

E _(π) [R _(t) |s=s _(t),α=α_(t)] According to a Bellman equation, ${Q^{*}\left( {s,a} \right)}\overset{\Delta}{=}{E\left\lbrack {{{{r_{t} + {\gamma\max\limits_{a^{\prime}}{Q^{*}\left( {s^{\prime},a^{\prime}} \right)}}}❘s} = s_{t}},{a = a_{t}}} \right\rbrack}$ an evaluation of the optimal strategy is expressed as: ${Q^{*}\left( {s,a} \right)}\overset{\Delta}{=}{Q^{\pi}\left( {s,a} \right)}$ the optimal strategy is obtained by: π^(*) = arg Q^(*)(s, a) to search an optimal strategy in a large state space and a large action space, a DQN is introduced into MADRL; the optimal strategy and the value function are approximated as a one function according to Q_(i)(s, α; θ)≈Q*(s, α), where θ is a weight and is updated by training; the DQN comprises a target network and a current network, which are trained by minimizing a loss function to optimize the parameter θ; the loss function is: loss(θ)=(y _(t) ^(DQN) −Q _(t)(s _(t),α_(t);θ))² where, Q_(t)(s_(t), α_(t); θ) is an output of the neural network with the parameter is θ at the state s_(t), and y_(t) ^(DQN) is an output of the target network with the parameter is {circumflex over (θ)} at the state s_(t+1); y_(t)^(DQN) = r_(t) + γQ(s_(t + 1), a_(t + 1); θ̂) the loss function is minimized through a gradient descent algorithm, and the action-value function is approximated by the neural network until convergence.
 9. The energy efficiency optimization method for an IRS-assisted NOMA THz network according to claim 8, wherein an action-value function Q is, a target action-value function {circumflex over (θ)}=θ, an index T for iteration and an experience pool

are generated according to the random parameter θ; for episode=1 to M 1) initializing the state s_(t) 2) for t=1 to T a. selecting an action by the agents according to  = arg Q^(*)(s_(t), a_(t); θ); b. performing the action α_(t) by the agents to switch from the current state s_(t) to the next state s_(t+1); c. obtaining the reward r_(t) through data exchange between the agents and the central control unit; d. forming a tuple (s_(t), α_(t), r_(t), s_(t+1)) by s_(t), α_(t), r_(t), s_(t+1), and saving the tuple (s_(t), α_(t), r_(t), s_(t+1)) the experience pool

; e. randomly selecting a mini-batch tuple (s_(t), α_(t), r_(t), s_(t+1)) from the experience pool

; f. calculating y_(t) ^(DQN) according to y_(t)^(DQN) = r_(t) + γQ(s_(t + 1), a_(t + 1); θ̂); g. updating the parameter θ in loss(θ)=(y_(t) ^(DQN)−Q_(t)(s_(t), α_(t); θ))² through a gradient descent method; h. assigning θ to {circumflex over (θ)} to update θ every a period of time, that is {circumflex over (θ)}=θ; i. calculating energy efficiency EE_(t) by the central control unit; j. calculating the reward according to r_(t)=EE_(t+1)−EE_(t); ending the cycle. 